Techniques for Inverted Index Compression
نویسندگان
چکیده
The data structure at the core of large-scale search engines is inverted index, which essentially a collection sorted integer sequences called lists. Because many documents indexed by such and stringent performance requirements imposed heavy load queries, index stores billions integers that must be searched efficiently. In this scenario, compression essential because it leads to better exploitation computer memory hierarchy for faster query processing and, same time, allows reducing number storage machines. aim article twofold: first, surveying encoding algorithms suitable second, characterizing through experimentation.
منابع مشابه
Inverted Index Compression
The data structure at the core of nowadays large-scale search engines, social networks and storage architectures is the inverted index, which can be regarded as being a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by search engines and stringent performance requirements dictated by the heavy load of user queries, the inverted lists often st...
متن کاملI Inverted Index Compression
The data structure at the core of nowadays largescale search engines, social networks, and storage architectures is the inverted index. Given a collection of documents, consider for each distinct term t appearing in the collection the integer sequence `t , listing in sorted order all the identifiers of the documents (docIDs in the following) in which the term appears. The sequence `t is called ...
متن کاملOn Inverted Index Compression for Search Engine Efficiency
Efficient access to the inverted index data structure is a key aspect for a search engine to achieve fast response times to users’ queries. While the performance of an information retrieval (IR) system can be enhanced through the compression of its posting lists, there is little recent work in the literature that thoroughly compares and analyses the performance of modern integer compression sch...
متن کاملCluster based Mixed Coding Schemes for Inverted File Index Compression
One way to improve inverted file compression is to use the cluster property [1] of document collection, which states that term occurrences are not uniformly distributed. Some terms are more frequently used in some parts of the collection than in others. The corresponding part of the inverted list will consequently be small d-gap values clustered. Interpolative code [9] exploits the cluster prop...
متن کاملOptimize Document Identifier Assignment for Inverted Index Compression
Document identifier assignment is a technique for inverted file index compression, by reducing d-gap value of posting lists. It was approached by either TSP or clustering methods in existing study. However, there is no proper formulation for this problem and the existing approaches has no theory guarantee to be good approximations. In this paper, we first formulate document identifier assignmen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Computing Surveys
سال: 2021
ISSN: ['0360-0300', '1557-7341']
DOI: https://doi.org/10.1145/3415148